Skip to content

feat(profiles): rollout data compression #92133

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 22, 2025

Conversation

john-z-yang
Copy link
Member

No description provided.

@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label May 22, 2025
@john-z-yang john-z-yang marked this pull request as ready for review May 22, 2025 17:02
@john-z-yang john-z-yang requested a review from a team as a code owner May 22, 2025 17:02
Copy link

codecov bot commented May 22, 2025

Codecov Report

Attention: Patch coverage is 57.14286% with 3 lines in your changes missing coverage. Please review.

⚠️ Parser warning

The parser emitted a warning. Please review your JUnit XML file:

Warning while parsing testcase attributes: Limit of string is 1000 chars, for name, we got 2083 at 1:156614 in /home/runner/work/sentry/sentry/.artifacts/pytest.junit.xml
Files with missing lines Patch % Lines
src/sentry/profiles/consumers/process/factory.py 50.00% 3 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           master   #92133       +/-   ##
===========================================
+ Coverage   40.47%   87.86%   +47.39%     
===========================================
  Files       10145    10179       +34     
  Lines      582238   583484     +1246     
  Branches    22627    22627               
===========================================
+ Hits       235649   512690   +277041     
+ Misses     346137    70342   -275795     
  Partials      452      452               

@john-z-yang john-z-yang force-pushed the john/rollout-profile-compression branch from 4ee36cf to 89cef9a Compare May 22, 2025 17:33
@john-z-yang john-z-yang enabled auto-merge (squash) May 22, 2025 17:57
Copy link
Member

@markstory markstory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me 👍


if random.random() < options.get("taskworker.try_compress.profile_metrics"):
if random.random() < options.get("taskworker.try_compress.profile_metrics.rollout"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to change this, but in the future you can use sentry.options.rollout.in_random_rollout() for these kinds of checks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah that's good to know, thank you!

@john-z-yang john-z-yang merged commit 6dec121 into master May 22, 2025
59 checks passed
@john-z-yang john-z-yang deleted the john/rollout-profile-compression branch May 22, 2025 18:43
Comment on lines +32 to +34
b64encoded_compressed = b64encode(
zlib.compress(
message.payload.value,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for the drive-by review after the fact:

Is there a specific reason why you went with zlib over zstd, which is an overall better compression algorithm which should be preferred?
And is the base64 encoding a hard requirement because the tasks can’t handle bytes arguments?

It is just really weird that this end up being base64(zlib(base64(msgpack))).
We could just make this zstd(msgpack), or base64(zstd(msgpack)) in case we really can’t have bytes.

base64 inflates the payload size by 33% by definition, and its a bit wasteful to do it twice even.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a specific reason why you went with zlib over zstd, which is an overall better compression algorithm which should be preferred?

zlib was in stdlib, and I didn't know we already had the necessary dependencies for zstandard.

It is just really weird that this end up being base64(zlib(base64(msgpack))).

The current implementation doesn't have a double base64 encode. We do base64 twice so that we can measure the effects of compression, but the task payload should only be base64 encoded once after compression.

And is the base64 encoding a hard requirement because the tasks can’t handle bytes arguments?

Yes, bytes aren't JSON encodable, so we needed a way to get a str.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for clarifying. I got confused by the double-base64, re-reading this again, its clear there is no double-encoding going on :-D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Scope: Backend Automatically applied to PRs that change backend components
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants